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1 . My home address is 2 1 6 Buccaneer Way, Mantoloking, New Jersey. 



2. I received a Bachelor of Science degree in Mechanical Engineering with a minor in 
Electrical Engineering and a Masters degree in Mechanical Engineering from the 
University of Notre Dame. 



3. I received a Masters degree in Technology Management from Stevens Institute of 
Technology. 
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4. I have extensive experience and expertise in: the communications industry, including in 
the difficult transition from manual to computer-based communications and message 
switching systems and networks; real-time data collection and process-control systems; 
defense acquisition systems; knowledge management systems; and, commercial/ business 
applications. 

5. In 1968 founded Symbolic Systems, Inc., a company based in New Providence, New 
Jersey that develops software for both government and industry. 

6. I am a frequent lecturer and have authored a number of technical papers, including 
Authoritative Data Source (ADS) Framework and ADS Maturity Model, Proceedings of 
the Ninth International Conference on Information Quality (ICIQ-04), November 5-7, 
2004, MIT pp 346-357, which is appended to this declaration. 

7. There are three distinct parts to my System And Method For Signaling Quality And 
Integrity Of Data Content. They include: analyzing the content of preexisting digital data; 
grading/scoring/rating the results of the analysis without accessing the preexisting data; 
and presenting/labeling the grading/scoring/rating in one or more output forms without 
accessing the preexisting data. The eloquence of this is that multiple 
grading/scoring/rating of the results of the analysis can be done each applying a different 
set of rules without accessing the preexisting data and each of the grading/scoring/rating 
can be presented/labeled in a suitable manor for a particular use. Thus allowing various 
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decisions regarding the suitability of the data to be made without needing to have access 
to the preexisting data after the initial analysis step. 

8. There is a pervasive need to have current, reliable, and trusted data from what are termed 
Authoritative Data Sources. This requirement has grown increasingly important in all 
industries and in particular in the military as it transforms. The transformation requires 
individual organizations, each using an array of independently developed stovepipe 
systems, to function on the battlefield with other weapons systems, services, and friendly 
nations, sharing communications networks and data. 

9. Organizations are increasingly relying on digital systems to conduct their business. These 
digital systems typically interoperate with other digital systems, both internal and external 
to the organization. As this reliance on digital systems has grown, so has the reliance on 
data that is provided by others. This data may be used or published as is or may be 
integrated and manipulated. The number of inter-system transactions and information 
exchanges has proliferated even more with the expansion of Internet use. Often the data 
provided by a source information provider is critical to the successful operation of the 
receiving organization. 

10. None of the cited prior art discloses the three distinct parts of: analyzing the content of 
preexisting digital data; grading/scoring/rating the results of the analysis without 
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accessing the preexisting data; and presenting/labeling the grading/scoring/rating in one 
or more output forms without accessing the preexisting data. 

11. I further declare that all statements made herein of my own knowledge are true and that 
all statements made on information and belief are believed to be true; and further that 
these statements were made with the knowledge that willful false statements and like so 
made are punishable by fine or imprisonment, or both, under Section 1001 of Title 18 of 
the United States Code, and that such willful false statements may jeopardize the validity 
of the application, any patent issued thereon, or any patent to which this verified 
statement is directed. 
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Authoritative Data Source (ADS) Framework 
and ADS Maturity Model 1 

(Practice-oriented Paper) 

Frank J. Ponzio, Jr. 

Symbolic Systems, Inc., New Providence, New Jersey 
fponzio@sym bolic.com 

Abstract: Throughout the Federal Government, including the Department of Defense 
(DoD), there is a pervasive need to have current, reliable, and trusted data from what are 
termed Authoritative Data Sources (ADSs). This requirement has grown increasingly 
important as the military transforms. The transformation requires individual 
organizations, each using an array of independently developed stovepipe systems, to 
function on the battlefield with other weapons systems, services, and friendly nations, 
sharing communications networks and data. To ensure the accuracy of the data being 
provided by ADSs, new methodologies and metrics must be initiated. This paper 
proposes adoption of an Authoritative Data Source (ADS) Framework to analyze and 
adjudicate information quality issues prior to publishing data for consumer use and an 
ADS Maturity Model to rate data providers. 

Key Words: Information Quality, Data Quality, Authoritative Data Source, Data Quality Feedback, Adjudication, 
Maturity Model, Framework 

Introduction 

Many communities of interest, within and outside of the Federal Government, rely on ADSs for certain 
types of data. An ADS ? s product can range from simple lists of codes and associated names to complex 
work products like architectures. Within the Department of Defense (DoD), for example, the areas of 
systems architecture, Command and Control (C2), and Situational and Battlefield Awareness, all need 
reliable, trusted data from ADSs for mission success. This need is heightened by the shift to net -centric 
operations within DoD. 

This paper discusses the adoption of an Authoritative Data Source (ADS) Framework [4] in organizations 
that rely on external sources for data or that distribute their data to others. The author proposes that this 
framework be adopted to define a repeatable process for improving the quality of ADS data products. The 
paper also proposes that an ADS Maturity Model [4] be adopted to assess the quality and risks associated 
with a specific ADS product. A maturity model for data would provide a standard against which data 
sources could be assessed, similar to the Capability Maturity Model® for Software, which was broadly 
accepted as the de facto standard for assessing and improving software processes [2], [9]. 

Adopting these two models is a transformational, scalable solution for the ADS community. For ADS 
providers, it is a transition plan to ensure the high quality of data that is needed by all data consumer 
communities. For ADS product users, this solution offers a means to assess confidence in the quality of 
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the data and determine the risks associated with the ADS product they are using. They thereby become 
more knowledgeable users. 

This paper, in addition to proposing the ADS Framework and ADS Maturity Model, also presents some 
considerations, proof of concept experiences, lessons learned, challenges, and conclusions that are 
associated with this initiative. It does not detail all of the tasks (programs, checklists, reports, etc.) 
required to implement these models. 

Since the ADS Framework and ADS Maturity Model have recently been adopted by some organizations 
within the U.S. Army, this paper uses the military in its examples. The use of these models can be applied 
as a standard data quality improvement process for any government or commercial organization that relies 
on external source data for successful operation or that wishes to confirm the quality of the data that it 
disseminates to others. 

Background 

Organizations are increasingly relying on systems to conduct their business. These systems typically 
interoperate with other systems, both internal and external to the organization. As this reliance on systems 
has grown, so has the reliance on data that is provided by others. This data may be used or published as is 
or may be integrated and manipulated. The number of inter-system transactions and information 
exchanges has proliferated even more with the expansion of Internet use. Often the data provided by a 
source information provider is critical to the successful operation of the receiving organization. 

For example, the Department of Defense is transforming the military from individual service arms, each 
of which develops its own battlefield systems for its own use, to a more collaborative, inter-networked 
force. Not only are the digital systems within any one service now required to interoperate and share data 
among themselves, but also many systems developed by and for individual services are part of the joint 
battlefield 1 s Tactical Internet. Because digital systems were initially developed as independent stovepipe 
systems, their underlying databases and systems requirements are not standardized. In addition, the 
networking information they require to communicate with each other across the Tactical Internet also 
differs from system to system. Consequently, different battlefield systems rely on different, and in some 
instances, many Authoritative Data Sources (ADSs) to provide key information for the successful 
interoperability of their systems. An overview of how one organization in the U.S. Army has applied the 
ADS Framework and ADS Maturity Model and lessons learned from this initial proof of concept is 
covered later in this paper. 

To date, ADSs could be considered a "cottage industry," where many ADSs are providing a variety of 
data products using a multitude of methods with multiple risks regarding the data they are providing. The 
users of data from ADS products face risks when using this data. The following types of questions should 
be asked to help mitigate some risks: 

■ Am I using the same version of this data that everyone that I need to interoperate with is using? 

■ Should I all be using a later version? Does a later version exist? 

■ Have I properly taken into account changes between versions of the ADS's data? 

■ Is one ADS's data consistent with the same data from other ADS? 

The problem facing most organizations is what, how, and when to address the risks. In a best-case 
scenario, everyone is taking the necessary actions to address these risks. However, this results in 
significant duplication of activities across organizations, probably with varying results. In a worst-case 
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scenario, none of the risks are being addressed and wrong information is being used for system tests. 
Unfortunately, everything "looks good" until ADS issues surface in integration tests, exercises, or when 
the data is required in a production environment. 

At the same time, many organizations, in addition to being data consumers, are also data providers. Just 
as they wish to protect themselves from risks associated with the data they consume, organizations may 
also wish to be viewed by their customers as providers of high-quality data. Providing products of poor 
data quality is costly. In addition to the potential for lawsuits, there are other associated costs (i.e., 
operational, lost business opportunity, and unnecessary expense associated with disseminating flawed 
data [3].) Additionally, the necessity to provide high quality data is in some cases, for example in the 
Federal Government with the passage of the Data Quality Act, even a legal requirement [5]. ADSs can 
adopt these models within their own organizations, prior to data publication. Adopting a framework 
through which data is reviewed prior to distribution can limit the provider's liability, in addition to 
enhancing its customer relations. 



Mitigating Risk and Publishing Better Data Products 
Through Process Changes 

Organizations that rely on external information can integrate new processes into their operating 
procedures to help mitigate both the risks of importing and of disseminating bad data. This involves 
setting up a framework under which data will be reviewed and data issues will be adjudicated with the 
source provider prior to its use or the data consumer prior to its dissemination. It also incorporates rating 
the maturity of the data based on the analysis and other efforts made by the source to confirm its quality. 
Consistent use of these new processes being incorporated into a data quality standard operating procedure 
will encourage ADSs to make every effort to provide a better quality product to their data consumers. 

ADS Framework Model 

The proposed ADS Framework Model is presented in Figure 1 . 

Authoritative Data Sources (ADS) 
Framework Model 
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Figure 1: Authoritative Data Sources (ADS) Framework Model 
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The ADS process receives input that an addition, modification, or deletion is required in an ADS product. 
This change could be as simple as a change to a code list or as complex as a change to a multi-node 
network architecture that is used by others in their development plans. 

The updated ADS product is then submitted and subjected to an ADS analysis, the intent of which is to 
improve and validate the quality of the ADS product and to reduce the risks identified above for users of 
the ADS product. The ADS analysis can use prior versions of the ADS product and, if available, data 
from other sources for the ADS analysis. The ADS analysis might include a comparison and contrast 
analysis using the prior version and a uniformity and consistency analysis using data from other sources. 
The results of the ADS analysis are then submitted to an adjudication process. The adjudication process 
would be performed using the developers of the ADS product and Subject Matter Experts (SMEs), who 
are generally users of the data, possibly in the form of a data review board, to scrub the results. 
Adjudication would ascertain if, based on the input that was provided to the ADS process, the ADS 
results are acceptable or unacceptable. The unacceptable results would be used as feedback to the ADS 
process. This feedback loop process would continue until only acceptable results are achieved. 

At this point, the ADS product and the ADS analysis results would be published and appropriate alerts 
distributed to users. 

The use of the ADS Framework model is transformational for both the ADS provider and user. For the 
ADS provider, the ADS analysis negates the resource burden of developing and publishing an analysis. 
The adjudication process adds additional expertise to the process and expands the sphere of knowledge 
associated with the ADS product. For the user, who relies on the ADS product, this is the equivalent of an 
Underwriters Lab (UL) [6] approval of the results with full transparency and disclosure of the ADS 
product and ADS analysis. 

Another part of the ADS Framework is to have each ADS provide an ADS analysis with each version of 
the product that contains the following information: 

■ The details of the additions, changes, and deletions between this version and the prior version. 

■ The types of internal quality assurance validations that have been performed on the product. This 
would include duplication, consistency, uniformity, etc., types of checks. 

■ The location of other sources of which they are aware, who provide the same or similar information. 
Users could decide if, when, and how to use this information as part of their risk mitigation plans. 

This information is captured in tag information associated with each ADS product. This disclosure to 
potential users would be used as part of the ADS risk assessment. 2 

Expanding the ADS Framework Model 

The framework is scalable as a part of a semantic heterogeneity 3 process. This multi-tiered approach is 
helpful to resolve questions, conflicts, or issues that arise when there are multiple data source providers. 
When multiple sources supply shared data elements the data product characteristics in each source and in 
the receiving application are cross referenced to the common data architecture. A crosswalk analysis 
matrix, also referred to as a data feed table[l], is developed. For example, if two sources provide 
addressing information, the following types of analyses are performed. 



2 It is important to understand that publication of ADS data is only a snapshot in time. In all likelihood, 
the updated additions, changes, and deletions were in effect before the publication was issued. 

3 The identification of semantically or conceptually related objects in different databases. [6] 
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■ A crosswalk analysis between data products to determine data elements that are shared, and if found, 
any naming conflicts and attribute differences. For example, if one source uses the naming 
convention SYSTEM ID and another uses ID SYSTEM, these may be referring to the same data 
element and require mapping. Furthermore, if one source provides addressing information that is 
incomplete without additional addressing information from the other source, inter-product analysis 
may be required when one source provides updated data. 

■ For common data elements with different naming conventions or field attributes, a set of mapping 
tables for standardizing names and field attributes prior to import into the recipient database. 

■ When data is published, a comparison of the data that is shared between sources in order to highlight 
any differences or inconsistencies between data sources or conflicts in dependencies between related 
data elements from data sources. 

Figure 2 expands on the ADS Framework Model shown in Figure 1, to show how the model can be 
expanded. 
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Figure 2: Authoritative Data Sources (ADS) Framework Model scaled to address multiple data sources. 
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Implicit in a two-tier framework, as in the figure above, is a Community of Interest (COI). The 
framework now involves, at a minimum, three and potentially more stakeholders: the data recipient and 
the data sources whose products were included in the crosswalk analysis. The resolutions of any data 
issues are resolved as a community, since any or all of the data sources may be affected. 

ADS Maturity Model (AMM) 

The use of a maturity model is beneficial for everyone — ADS providers and users. It provides an implied 
transition plan for each ADS to strive to improve its products. It provides users with the knowledge of 
what level of maturity is associated with the ADS product. In situations where there are multiple sources 
for the same or equivalent ADS product, users could use the maturity model level designation of the 
source to determine which source to use and/or in which source they have the most trust. 



The maturity model has levels that would indicate the risk management steps that were addressed by the 
ADS provider. For example, if there is a five-level model, the maturity levels might be determined as 
follows: 



Maturity 
Level 


Risk Management Steps Taken 


Information Source 


0 


No ADS analysis is provided. 




1 


The adds, changes, and deletion between successive 
versions are provided and approved by the provider. 


Provided in the ADS analysis 


2 


Duplication, consistency, and uniformity checks were 
performed. 


Provided in the ADS analysis 


3 


The ADS results have been scrubbed by at least one 


Provided in the adjudication 




SME. 


process 


4 


Multiple SMEs and multiple users have accepted the 
ADS analysis results 


Provided in the adjudication 
process 



Table 1 : The ADS Maturity Model 



The AMM level also captures tag information to be used by users as part of the ADS risk assessment. 

Example 

Because of the broad and in depth interest in the use of architectures within the Federal Government, an 
architecture example is used below as an example to assist readers in determining the applicability of 
adopting this ADS Framework to their data sources. 

The organization producing a data-based architecture would be considered an ADS to the organization 
using that architecture in its systems or applications to build a product. An architecture may consist of 
multiple components that typically present various "views" of the architecture. For example, in the DoD 
Architecture Framework there are multiple operational, system, and technical views of the architecture^]. 

A change to an architecture can affect the network structure, which impacts battlefield communications. 
Therefore, adjudication of any changes by all stakeholders is essential to ensure that all affected parties 
have had input and are aware of the impact. In addition, a framework is also essential to ensure that a 
change in any one view is reflected in all other published views. 

For example, if an existing radio were replaced with a radio that uses a new technology and provides 
additional capabilities, this could have broad impact on the architecture. The group creating the 
architecture would develop a new architecture version reflecting the new radio and the necessary 
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connectivity changes in the various views. The ADS analysis would detect all of the changes made in all 
of the views, comparing them to the prior version of the architecture and other authoritative sources. The 
stakeholders, to ensure that all changes required for the new radio have been reflected in the new 
architecture version, would submit these ADS results for adjudication. In addition, they would ensure that 
the desired communication functionality will be achieved by the warfighter. If there were multiple SMEs 
and users involved in the adjudication process, the organization that developed the new architecture (the 
ADS) would be at a maturity level 4. 



Concept Validation 

Our company has been heavily involved in the Army's efforts to produce initialization data for the 
digitized weapons systems now being deployed. As described earlier in this paper, these systems rely on 
accurate addressing and networking information for routing communications and C2 information via a 
Tactical Internet. The network and addressing information is provided by a number of sources within the 
Army and DoD to the group in charge of providing the initialization data loads. In order to produce the 
data loads for the various digitized battlefield systems, the information is imported into a database, 
manipulated and enhanced, and then exported in a number of different file formats with different field and 
attribute requirements to meet the unique requirements of the individual weapons system. 

Figure 3 shows how the ADS Framework is applied at this site on both the input and output sides of the 
system. Data products received from ADSs are analyzed and adjudicated before they are integrated into 
the database. Data products produced from the database are analyzed, posted for review, and adjudicated 
by a Data Review Board before being published. Action items that are the outcome of the adjudication 
process are assigned to the original ADS, the initialization data product developer, or the ultimate 
consumer, as applicable. 
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Figure 3: ADS Framework applied both on the input and the output sides of the system. 



Applying the ADS Framework to ADS Data 

As leads on the Data Harmonization team, we discovered early in our engagement that data provided by 
ADSs was not necessarily accurate or complete. In addition, we discovered that some data elements 
common to more than one ADS product were conflicting. Some sources, in order to meet their deadlines, 
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provided what they had at the time it was required without doing any quality checks. Others made 
undisclosed data changes between versions that affected other systems and programs. 

The Data Harmonization team developed a number of analyses, specific to the ADS providing the data, in 
order to ensure that data is accurate before importing it into the initialization database. When new source 
data was provided, analysis reports were produced and a feedback/correction cycle continued until the 
provided data was accepted. Hence, an informal ADS Framework was in operation. This informal 
mechanism led to formalizing the process by developing and publishing to all ADSs and stakeholders the 
ADS Framework process that would be used to analyze and adjudicate issues related to the source data 
upon which we relied. 

Applying the ADS Framework to Data Products before Publication 

Our customer, being a data product publisher as well as a data consumer, was anxious to provide each 
systems developer accurate data load information. The goal was to reduce the number of data-caused 
errors when the systems and networks were tested for interoperability on the test floor and to minimize 
any communications errors caused by incorrect network or communications addressing data on the 
battlefield. In the former case, both time and money are wasted if data error and correction cycles extend 
the testing cycle; in the latter case data accuracy can determine whether a soldier can communicate or a 
field commander receives complete and correct information for making battlefield decisions. Hence, data 
errors can cost lives. The customer's goal was to be a maturity level 4 data provider. To this end, we 
applied the ADS Framework to the developed data products through a formal Data Review Board. 

The board, which meets whenever there is a new data load release, reviews with the battlefield systems 
developers the data that will be provided for their systems and reviews with the operations group from the 
test floor the proposed network. Prior to a Data Review Board meeting, the data to be incorporated in the 
final files are analyzed and views of the file data are developed for each consumer. All stakeholders, 
including the system developer SMEs, are invited to participate either in person or by teleconference. The 
analysis provided and SME participation in the process rates the data provided in the data loads at a 
maturity level of 4. The success of the Data Review Board with its adoption of this quality process has 
virtually eliminated data-related error reports from interoperability testing. 

Developing a Data Product Integration Architecture 

In order to help determine the analysis requirements for data products (both provided by ADSs and 
published by our customer), we developed a data product integration plan based on the DoD Architecture 
Framework[8]. However, because the DoDAF was primarily developed to document systems, not data, 
additional views expanding the details of how data elements flow through the process as well as map to 
and interact with other data elements were added. Descriptions of the key systems views are described 
below. 

Data Exchange Matrix 

This system view documents information about the data/information we exchange with ADSs. The 
communications paths included in the matrix followed the process of receiving and of feeding back 
information. In most cases, the source system element was specified as a particular application or 
database, but in other cases the system was identified as a telephone. (This is used in some cases as the a 
feedback mechanism for issues about a received data file.) 

Developing this matrix proved helpful for analyzing whether there were any inefficiencies in the methods 
used to exchange information and data files with our ADSs. 
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